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ABSTRACT 
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^Th«  preconditioned  conjugate  gradient  algorithm  has  been  successfully  applied  to  solving 
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1.  INTRODUCTION 

Exploiting  the  spaisity  stnieture  of  larte  symmetric  positive  definite  matrix  A  when  solving 
linear  systems 

Ax  —  b  (1) 

has  become  an  essential  part  of  acceptable  solution  techniques.  Here  we  are  mainly  interested  in 

matrices  arising  from  finite  element  applications. 

The  conjugate  gradient  method  (CG  hereafter)  is  one  of  the  more  popular  iterative  methods 
for  solving  (1).  At  each  iteration  step  the  CG  method  requires  the  computation  of  A  v  for  a  given 
vector  V.  This  is  the  only  connection  between  A  and  the  algorithm  and  is  a  natural  way  of  mak¬ 
ing  use  of  the  sparsity  of  A. 

Past  experience  has  shown  that  the  application  of  the  CG  method  directly  to  (1)  may 
require  many  steps  to  reduce  the  residual  norm  to  an  acceptable  level.  This  difficulty  can  be 
overcome  through  preconditioning.  Instead  of  solving  (1),  we  attempt  to  solve 

M-‘Ax  -  M'‘b  (2) 

where  M*'  is  an  approximation  to  A'‘.  In  order  to  apply  CG  to  (2),  M  must  also  be  symmetric 

positive  definite. 

Preconditionings  based  on  SSOR,  fast  direct  methods,  or  on  the  block-diagonal  part  have 
been  effective  in  stimulating  research  in  this  area  (see  (2,3,8|  for  more  details).  One  of  the  suc¬ 
cessful  preconditionings  used  in  connection  with  CG  is  the  incomplete  Choleski  factorization  of  A 
(ICCG)  [9,11,13,15].  In  some  of  these  papers  truly  spectacular  numerical  results  are  reported. 
ICCG  type  methods  proved  to  be  far  superior  to  other  iterative  methods  and  demonstrated  the 
true  potential  of  preconditioned  CG  method. 

The  implementation  of  ICCG  type  methods  is  not  difficult  for  finite  difference  matrices  [13|. 
If  one  forces  the  incomplete  Choleski  factors  to  have  the  same  zero  structure  as  that  for  A  then 
for  general  matrices,  in  particular  for  finite  element  matrices,  the  data  structure  for  A  and  the 
incomplete  factors  are  the  same  and  the  same  array  of  pointers  can  be  used  for  both.  The  data 
structure  however,  becomes  more  complicated  if  one  accepts  or  rejects  fill-in  according  to  the 


rehtive  size  of  the  matrix  elements  as  in  (15).  For  this  algorithm  the  data  structure  has  to  be 
changed  during  the  process  of  incomplete  factorization.  This  complication  prompted  us  to  take  a 
simpler  approach  to  preconditioning.  We  partition  A  into  the  sum  of  two  matrices 
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Figure  1.  Partitioning  of  A  into  dense  matrix  M  and  the  remainder  R. 


where  M  is  a  symmetric  part  of  A  whose  Choleski  factors  has  little  or  no  fill-in  (see  figure  1).  We 
will  refer  to  M  as  the  dense  part  of  A.  Here  we  only  consider  the  banded  form  of  M,  but  profile 
or  any  other  sparse-factorizable  form  of  M  can  abo  be  used.  M~^  may  now  be  used  as  a  precondi¬ 
tioner. 

Considering  the  simplicity  of  thb  approach,  one  might  ask,  why  this  preconditioning  method 
has  not  been  tried  earlier.  Recently  some  similar  ideas  were  reported  in  [14].  One  apparent  rea¬ 
son  is  that  M  (the  dense  part  of  A)  b  not  positive  definite  in  general.  Indeed  thb  restricts  the 
applicability  of  our  method.  However  we  will  show  that  an  important  class  of  matrices  has  a 
positive  definite  dense  part.  A  special  case  of  this  splitting  that  is  emphasised  here  is  when  M  b 
banded.  We  refer  to  M  as  the  banded  part  of  A  however  all  the  results  that  follow  extend 
immediately  to  a  more  general  M  (profile  form  etc.). 


3.  Preeonditlonnd  CooJusato  Gradients 

The  initial  warm  embrace  given  to  conjugate  gradient  method  was  due  to  a  number  of  fac¬ 
tors.  In  exact  arithmetic  CG  cannot  take  more  than  n  steps  to  solve  Ax  »  b  for  positive  definite 
A.  Soon  it  was  found  that  under  certain  conditions  practical  CG  may  require  as  many  as  3n  or 
4r  steps  to  reduce  the  residual  to  an  acceptable  level  |4|.  This  blemish  can  be  accounted  for  by 
the  strong  influence  of  round-off  errors. 


.  4- 


The  resurrection  of  CG  is  due  to  the  addition  of  preconditioning.  The  original  ill* 
conditioned  problem  is  transformed  into  a  problem  that  is  better  conditioned.  There  are  some 
theoretical  arguments  as  to  why  preconditioning  is  a  reasonable  technique.  Standard  results  show 

that  the  residual  norm  is  reduced  by  a  factor  at  least  — —  at  each  iteration,  k  is  the  condi- 

v/ic  +  1 


tion  number  of  A,  defined  as  /c(A)  —  ||  A  ||  - 1|  A~*  ||  (see  e.g.  {!]).  When  A  is  ill*conditioned  (i.e. 
k(A)  is  large),  one  can  solve  equation  (2)  and  choose  M  such  that  the  condition  of  A  is  small. 
A  well  chosen  M  will  require  only  a  few  iterations  of  the  CG  algorithm  to  reduce  the  residual 
norm  to  the  desired  level. 

There  are  two  different  approaches  that  one  can  take  in  constructing  preconditioning 
matrices; 

(i)  Incomplete  Factorization  of  the  Complete  A 

The  incomplete  Choleski  factorization  can  be  viewed  this  way.  Here  the  Choleski  factors  of  a 
positive  definite  matrix  A,  is  formed.  A  is  a  matrix  close  to  A  with  its  Choleski  factors  having 
the  same  non*zero  structure  as  the  lower  triangular  part  of  A.  It  should  be  noted  that,  in  general, 
A  is  denser  than  A  and  the  elements  of  A  are  different  from  the  elements  of  A. 

(ii)  Complete  F actorization  of  an  Incomplete  A. 

Here  one  removes  those  off-diagonal  elements  of  A  that  are  responsible  for  filUin  when  A  is  fac¬ 
tored.  An  example  such  of  approach  is  diagonal  scaling. 

These  points  of  view  are  merely  helpful  distinctions  of  preconditioning  methods.  They  do 
not  yield  two  separate  classes  of  preconditioning  matrices  as  the  following  example  shows:  let 


1-10  0 
-1  4  -1  -1 

*  0-1  3  -1  • 

0-1-1  7 

Then  one  obtains  the  same  preconditioning  matrix  M,  whether  one  uses  an  incomplete  factoriza¬ 
tion  with  the  factors  having  a  bi-diagonal  structure  or  whether  the  tridiagonal  part  of  A  is  used. 
Our  main  concern  is  general  methods  that  are  derived  using  the  second  approach. 
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We  give  an  outline  of  the  CG  algorithm  here.  Given  an  initial  guess  Xo  and  a  positive 
definite  matrix  M  the  preconditioned  CG  ^gorithm  (PCG  hereafter)  generates  a  sequence  of 
approximations  to  the  solution  x  as  follows: 

(I)  set  Po  =  To  =  b  -  Ax 

(II)  solve  Miq  s  ro, 

(in)  For  k  s=  0,  step  1  until  convergence  do 
(a)  at  =»  (i*t,it)/(Pi,Apt) 

(•>)  *1+1  =  Xt  +  atPt 

(c)  Ft+i  =  rt  -  atApt 

(d)  solve  Mit^.!  =  ft+i 

(e)  =»  (rt+i.»t+i)/('*,»t) 

(0  Pt+l  =•  •*+!  +  ^tPk 

A  nice  derivation  and  discussion  of  the  variants  of  the  basic  conjugate  gradient  algorithm 
can  be  found  in  (16|.  The  preconditioned  version  of  the  algorithm  is  explained  in  [3|. 

3.  Poaitive  DefInItenoM  of  M 

The  idea  of  using  the  complete  factorization  of  an  incomplete  A  has  been  raised  before. 
Methods  that  have  been  constructed  by  this  approach  include  diagonal  scaling,  line  methods  (8], 
which  can  be  considered  as  block  diagonal  preconditioning,  and  hybrid  method  [I0|  which  is  a 
blend  of  a  2X2  block  diagonal  factorization  into  CG  iteration.  The  reason  this  approach  has  not 
been  generalized  to  more  sophisticated  structures,  is  that  an  incomplete  A  is  not  positive  definite 
in  general.  Here  we  give  an  example  to  illustrate  this  fact. 

Esmsls 

&  4  l1  M 

Consider  the  3X3  matrices  Aa  4  4  1  and  Tj  4  4  1.  It  is  trivial  to  show  that 

b  1  ij  IP  1  IJ 

det(A3)  «  3  and  det(A2)  «  4,  where  Aj  is  the  first  principle  minor  of  Aj.  The  tridiagonal 
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matrix,  Tg,  obtained  by  omitting  the  appropriate  off-diagonal  elements  of  Aj,  is  indefinite.  In 
fact  det(T8)  «  -1. 

Now  we  try  to  establish  some  guidelines  which  will  help  us  to  identify  classes  of  positive 
definite  matrices  with  positive  definite  incomplete  part.  One  such  class  is  the  class  of  diagonally 
dominant  matrices.  Setting  any  pair  of  off-diagonals  symmetrically  to  zero  will  leave  the  matrix 
positive  definite  and  symmetric.  Another  class  is  given  by  the  symmetric  H-matrices  with  positive 
diagonal  (c.f.  (ll]). 

Matrices  arising  from  the  finite  element  discretization  of  elliptic  boundary  value  problems 
can  be  represented  in  the  form 

A  = 

•—1 

where  A,  represents  the  contribution  of  each  finite  element  to  the  global  matrix  A.  The  sum  is 
over  the  total  of  p  elements.  A,  can  be  written  more  precisely  as 

A,  N,  •,  N/ 

where  n,  is  a  small  element  stiffness  matrix  and  N,  is  an  n  X ;  boolean  connectivity  matrix  with 
j  «  n.  N,  are  formal  representations  of  the  way  in  which  the  elements  are  joined  together.  In 
general,  a,  is  singular  and  N,  has  full  rank.  For  elliptic  boundary  value  problems,  A,  is  a  sym¬ 
metric  positive  semi-definite  matrix  and  by  construction  A  must  also  be  symmetric  positive  semi- 
definite.  We  can  now  state  and  prove  the  main  proposition  concerning  the  positive  definiteness  of 
incomplete  A. 

Proposition-.  Let  A  ^  A,  be  a  symmetric  positive  definite  matrix  and  let  A  »  ^  A,  +  A„, 

|K| 

where  A^  nn<l  A, ,  i  ~  1,  ■  ■  ■  ,p ,  are  all  symmetric  positive  semi-definite.  Then  A  is  also  posi¬ 
tive  definite  if  N{A.„)  C  iV(A„),  where  N(A,)  denote  the  null  space  of  A,. 

Proof :  A  is  positive  definite  iffQN(A,)  is  trivial.  Since  IV(An)  C  iV(A„)  then 

I 

P 

Q  iV(At)p|iV(A^)  is  also  trivial  and  the  result  follows. 

■■4 


■^11 1 
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This  suggest  that  some  of  the  off-diagonal  elements  of  Am  can  be  removed  symmetrically  to  get 
A,  without  affecting  the  positive  definiteness  of  the  assembled  matrix  provided  that  the  null  space 
of  Am  is  contained  in  the  null  space  of  Am  and  Am  is  positive  semi-definite.  This  result  may 
seem  rather  cumbersome.  However,  as  the  following  simple  application  demonstrates,  it  is  very 
useful. 

Consider  the  structure  example  of  figure  la.  Each  element  consists  of  two  nodes  with  two 
unknowns  per  node;  the  horizontal  and  vertical  displacements.  The  resulting  system  of  equations 
has  a  total  of  10  unknows  (because  the  total  number  of  nodes  is  7  with  2  on  the  boundry  resulting 
in  only  5  active  nodes).  Each  element  makes  a  contribution  to  the  global  matrix  which  couples 
the  unknowns  at  the  two  nodes  of  the  element.  The  explicit  form  of  the  element  matrices  are 


where  c  =  costf,,  s  =>  sintf,;  $,  defines  the  oriantation  of  the  t-th  element  and  k,  is  a  positive 
quantity  depending  on  the  stiffness  and  the  length  of  the  each  element.  The  coupling  between  the 
unknows  at  each  node  is  eliminated  when  n,  is  replaced  by 


n,  is  positive  semi-definite  and  N(s,)  C  N(s,).  This  can  be  verified  easily  since  the  null  vectors 
of  n,  are  qi  ss  (  g,  c,  0  0  and  as  (  0  0  s,  c,  which  are  also  the  null  vectors  of 
a,.  Therefore,  by  the  above  proposition,  replacing  any  a,  by  a,  will  retain  the  positive 
definiteness  of  the  global  matrix.  For  example  when  this  is  done  to  the  elements  connected  to 
nodes  1  and  5  the  coupling  between  these  nodes  is  eliminated  as  shown  in  Figure  la. 
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Figure  la  Original  structure  and  its  associated  matrix. 
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Figure  lb  Tie-down  structure  and  its  associated  matrix. 


4.  Implementation 

Let  A  =  M  +  R,  M  positive  definite,  and  M  «  CC^  the  Choleski  factorization  of  M. 
Then  banded  preconditioning  is  best  implemented  by  applying  the  standard  CG  method  to  the 
preconditioned  problem 

(I  +  C-'RC-’’)y  =  C-'b  (4) 

with  y  ss  C^x,  instead  of  using  the  algorithm  of  section  2.  There  are  two  good  reasons  for  this. 

First  if  M  is  dense,  i.e.,  no  fill-in  results  when  M  is  factored,  then  the  Choleski  factor  C  can  be 

written  over  the  storage  locations  provided  for  M.  In  this  case,  the  preconditioned  algorithm  does 

not  need  any  extra  storage.  In  contrast,  incomplete  factorization  preconditioners  |15|  requires  at 

least  NZA  or  more  extra  storage  cells  in  order  to  hold  the  incomplete  factors.  Here  NZA  denotes 
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the  Dumber  of  zeros  ia  the  upper  triangular  part  of  A.  However,  we  should  add  that  a  certain 
class  of  incomplete  factorization  preconditiooers,  as  in  |7|,  require  less  storage  than  the  more  gen¬ 
eral  versions.  The  second  advantage  of  (4)  is  that  the  matrix  vector  product  (I  +  C"*RC'^)y 
needs  exactly  2NZA  multiply-adds,  when  M  is  dense.  This  only  is  n  operations  more  than  that 
needed  in  forming  Ay.  Hence,  if  M  does  not  contain  any  zeros  within  the  band,  the  banded 
preconditioning  is  not  more  expensive  per  step  than  the  standard  CG  algorithm.  When  M  does 
contain  some  zeros,  the  cost  may  be  more  but  marginally. 

There  is  an  alternative  way  of  utilizing  the  choleski  factors  of  M  for  preconditioning.  Note 

that 

A  =  C(I  +  C-*RC-’')C^  (5) 

Therefore 

A-‘  =  C-'(I  -f-  C-‘RC-^)-*C-‘  (6) 

Using  the  truncated  Neumann  series  approach  (-S],  we  can  find  an  approximation  H  to  A'*  by 

H  =  -  C-‘RC-^)C-*  (7) 

H  can  be  used  in  connection  with  the  PCG  algorithm  if  (11)  is  replaced  by 

lo  =  Hro  (8) 

and  (Ill.d)  is  replaced  by 

•*+i  =  (9) 

The  convergence  behavior  of  the  two  different  implementations  depends  on  the  spectrum  of 

A.  Denote  C"‘RC'^  by  W.  Then  behavior  of  (4)  is  determined  by  the  eigenvalues  of  I  -I-  W. 
Let  X  be  any  eigenvalue  of  W;  I  +  W  is  positive  definite,  since  A  and  M  are  both  positive 
definite,  then  1  +  X  >  0.  Hence  X  is  greater  than  -1.  Further,  if  the  stronger  condition 
-1  <  X  <  1  holds,  then  the  Neumann  series  can  be  applyed.  The  convergence  behavior  of  PCG 
with  the  matrix  of  (7)  as  the  preconditioner  is  governed  by  the  eigenvalues  of 
HA  C"^(I  -  W)(I  +  W)C*^.  This  has  the  same  eigenvalues  as  (I  -  W)(I  +  W)  =  (I  -  W^, 

which  are  1  -  X®.  Hence,  k(I  +  W)  —  (respectively  Xn^j)  is  the  largest 
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(smallest)  eigenvalue  of  W,  and  k(I  -  Mf®)  - - y,  where  Xj,  (respectively  X,)  is  the  eigenvalue 

1  -  Xf 

closest  to  0  (closest  to  1).  Hence  one  can  not  determine  a  priori,  which  of  the  the  above  imple¬ 
mentations  has  better  convergence  properties.  This  choice  depends  on  the  problem. 

The  preconditioner  matrix  of  (7)  can  be  implemented  by  applying  the  CG  method  directly 


(I  -  W2)y  =  (I  -  W)C-‘b  (10) 

where  y  s  C^x.  This  version  nearly  doubles  the  cost  of  each  CG  step,  but  may  be  less  prone  to 
orthogonality  loss.  However,  as  stated  above  ail  the  eigenvalues  of  W  must  have  a  magnitude 
smaller  than  unity  for  the  matrix  of  (10)  to  be  positive  definite.  Unless  the  problem  is  better 
suited  for  the  second  implementation,  the  approach  with  (4)  is  preferable. 

Finally,  we  would  like  to  note  that  the  algorithm  described  here  can  be  extended  to  the  case 
when  M  is  not  positive  definite.  In  this  case,  M  can  be  factored  into  LDL^  with  D  having  both 
positive  and  negative  terms.  A  preconditioning  matrix  of  the  form  LDL^  can  then  be  con¬ 
structed,  where  the  terms  in  D  are  the  absolute  values  of  the  corresponding  terms  in  D.  The 
preconditioned  problem  then  becomes 


(J  +  D  *L-‘RL-’'D  *)y  =  C’b 

where  now  y  a*  D  L^x,  and  J  is  a  diagonal  matrix  with  elements  1  or  -1. 


5.  Numerlexl  Examples 

The  PCG  with  various  forms  of  banded  preconditioning  was  used  in  solving  several  large 
sparse  symmetric  systems  of  linear  equations.  One  of  our  examples  arises  from  finite  element 
discretization  of  a  rectangular  plate.  The  other  two  examples  are  derived  from  finite  difference 
approximations  of  elliptic  partial  differential  equations.  The  examples  and  characteristics  of  each 
problem  are  des<;ribed  in  mote  detail  in  tables  1  and  2.  All  matrices  are  included  in  the  collection 
of  "Sparse  Matrix  Test  Problems”  (6). 


Example  No. 

Problem  description 

1 

Poisson's  equation  in  an  L-shape  region  with  mixed  boundary 
conditions.  The  same  problem  as  example  4  in  |9|,  with  less  grid 
points. 

2 

-V(nvtt)  /  in  the  unit  cube  in  with  Drichlet  boundary 

conditions,  a  varies  from  IIT^  to  lO**.  Finite  difference  discreti¬ 
zation  with  7-point  difference  star  and  9x9x9  grid  points  is 
used  with  a  random  right-hand  side. 

3 

Rectangular  plate  problem  with  one  side  fixed  and  the  others 
free.  A  unit  point  load  is  applied  at  one  of  the  free  corners. 

Table  1.  Description  of  the  Examples 


Example 

No.  of 
Equations 

No.  of 
Nonzeros 

Half 

Bandwidth 

Fill-in 

Condition 

Number 

1 

675 

1965 

30 

9929 

7.1X10® 

2 

729 

2673 

81 

40961 

1.8X10® 

3 

960 

8402 

43 

66612 

3.5X10* 

Table  2.  Properties  of  the  Test  Matrices. 


In  the  second  column  of  Table  2.,  we  list  the  number  of  nonzeros  in  the  upper  triangular 
part  of  the  matrices,  and  in  the  third  column  the  half  band  width;  no  attempt  was  made  to 
reduce  the  band  width  by  reordering  the  matrix,  in  the  fourth  column,  we  list  the  number  of 
nonzeros  in  the  triangular  factorization  after  reordering  the  original  matrix  by  the  minimum 
degree  algorithm. 

In  our  numerical  tests,  we  extracted  the  banded  part  of  the  test  matrices  for  various  band 
widths,  m.  The  Choleski  factors  were  computed  using  the  LINPACK  (4|  subroutine  DPBCO. 


We  applied  the  two  implemeatations  of  banded  preconditioning  to  our  test  problems.  We  com¬ 
pared  our  algorithms  with  the  basic  CG  algorithm  with  diagonal  scaling  and  with  subroutine 
MA31  from  the  Harwell  Library  [15|.  MA3I  uses  an  incomplete  factorization  of  the  matrix  as 
preconditioning  with  a  drop  tolerance,  e.  we  used  e  »  ~1.0  and  c  «  0.1;  the  hrst  choice  allows 
no  fill-in  in  the  factorization  where  as  the  second  permits  all  fill-in  whose  magnitude  is  greater 
than  0.1  relative  to  the  diagonal  elements.  The  tolerance  may  change  in  the  routine  depending  on 
the  amount  of  storage  available  to  the  program. 

The  results  of  the  tests  are  given  in  the  tables  below.  For  each  example,  the  tables  list  the 
half-band  width  m  and  the  choice  of  e  for  MA31,  the  number  of  CG  steps,  the  time  in  seconds 
for  the  two  phases  of  the  algorithm  and  the  total  time.  In  these  tables.  Methods  1  and  2  refer 
respectively  to  the  simple  banded  preconditioning  and  banded  preconditioning  with  the  truncated 
Neumann  series.  .\11  tests  were  carried  out  in  double  precision  on  a  UNIV.\C  1100.  Our  termina¬ 
tion  criterion  was  a  reduction  of  the  residual  norm  to  a  factor  of  lO'*  of  its  initial  value. 


■i 

Time  for 

Time  for 

Total 

Method 

Bi 

CG  steps 

factor 

CG  Iter. 

time 

Basic  CG 

- 

17 

• 

1  30 

1  30 

Method  1 

1 

13 

0.31 

1  38 

1  69 

15 

038 

1  45 

1  81 

Method  2 

1 

8 

031 

I  36 

1  67 

2 

8 

0.36 

1  43 

1  79 

MA31 

-1.0 

2 

0.44 

0.22 

066 

0.1 

1 

0.46 

0  15 

061 

.1  a..  H  Results  for  Example  1. 


I 


Method 


CG  steps 


Time  for 
factor 


Time  for 
CG  iter. 


Total 

time 


Basic  CG 


Method  1 

1 

85 

0.32 

2 

85 

0.38 

8 

85 

0.73 

9 

63 

0.98 

Method  2 

1 

43 

0.32 

o 

43 

0.38 

8 

43 

0.38 

9 

32 

0.98 

MA31 

-1.0 

56 

0.62 

0.1 

25 

0.64 

Table  4.  Results  for  Example  2. 


■i 

Time  for 

Time  for 

Total 

Method 

IB 

CG  steps 

factor 

CG  iter. 

time 

Basic  CG 

- 

>  200 

- 

(27.37) 

(27.37) 

Method  1 

1 

>  200 

0.43 

(88.28) 

(88.71) 

<1 

111 

0.51 

49.36 

49.87 

MA31 

-I.O 

>  200 

2.52 

(105.19) 

0.1 

>  200 

2.67 

(107.87) 

Table  5.  Results  for  Example  3. 


Several  conclusioas  caa  be  drawn  from  the  above  results: 


I 
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(i)  Choice  of  the  bandwidth.  Clearly  the  choice  of  the  bandwidth  has  to  depend  on  the  physical 
structure  of  the  underlying  problem.  For  Example  2,  sensible  chokes  of  m  are  m  =  1,  m  s  9, 
m  as  27,  because  of  the  underlying  structure  of  the  matrix.  The  new  information  included  in  the 
band  then  leads  to  a  drop  in  the  number  of  CG  steps  and  possibly  in  the  total  cost.  As  it  turns 
out  in  Example  2,  m  a  9  is  already  more  expensive  than  m  »  1.  m  2  can  not  yield  an 
improvement  over  m  ^  1,  since  the  number  of  nonzeros  of  the  preconditioning  is  not  increased. 
We  would  like  to  note  that  by  reordering  the  matrix  it  is  possible  to  produce  better  precondition* 
ers. 

Similar  considerations  holds  for  examples  1  and  3.  In  Example  1  the  next  reasonable  choice 
would  have  been  m  =  30  and  in  example  3  m  »  40.  These  were  not  tried  since  the  experience 
from  example  2  seems  to  indicate  that  large  m  (m  >  10  for  example  2)  will  not  improve  the  time 
for  CG  iteration  to  warrant  the  increase  cost  of  factorization,  unless  the  matrix  is  reordered. 

(ii)  Comparison  of  the  two  impUmentatione.  In  the  examples  considered  here  there  appears  to  be 
no  significant  difference  in  the  overall  performance  of  the  two  methods  described  here.  It  turns 
out  that  the  use  of  Neumann  series  results  in  a  slightly  faster  solution  time  for  these  examples. 
However,  because  of  the  arguments  given  in  section  4,  the  first  implementation  is  preferred. 

(Hi)  Banded  PCG  versus  Incomplete  Factorization.  We  can  draw  no  definitive  conclusion  for  this 
comparison.  In  the  first  two  examples,  Ma  31  was  about  twice  as  fast  as  banded  PCG.  However, 
in  Example  3  \L\31  was  not  able  to  find  a  solution  within  200  steps,  whereas  the  simple  banded 
PCG  solved  this  difficult  problem  remarkably  well. 

The  numerical  examples,  especially  example  3,  show  that  the  banded  PCG  is  a  competitive 
alternative  to  preconditionings  based  on  incomplete  factorizations.  It  has  several  advantages 
which  are  not  clearly  reflected  in  the  above  tables  which  should  be  stressed  here; 

•  The  storage  demands  is  the  same  or  only  slightly  more  than  the  standard  CG  method. 

•  The  cost  of  one  CG  iteration  does  not  increase  when  using  this  preconditioner, 

s  The  factorization  of  M  can  be  performed  using  existing  codes. 

e  Conceptual  simplicity. 


f 
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